Introduction

The purpose of the project is to provide an easy, consumable form of skill rating for professional Call of Duty players. This could be used as a descriptive statistic, but it can also be used to guess the winner of the largest tournament of the year, Call of Duty Champs.


What is the Call of Duty World League?

Call of Duty is a first-person shooter that first began in 2003. Since then, it has become one of the largest multiplayer video game franchises to exist. During this time, a competitive scene for the game has gained traction. In 2016, the Call of Duty World League was born – a sponsored league that hosts major tournaments throughout the year for the best players in the world to play in. In these events, these pros play three different game modes to decide the winner of a series. These game modes are Hardpoint, Search and Destroy, and then a third game mode that often changes yearly. For the data that we are covering, the third game mode is Control. All of the teams in the league consist of 5 players, and the series are Best of 5’s.

embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s")

Game Modes in Circuit

Hardpoint

In Hardpoint, the two teams must fight over a point on the map where every second they spend in this point, they gain one point. This point is called the “hardpoint.” If two teams are in the hardpoint at the same time, then neither teams collects points. Every sixty seconds, the hardpoint changes locations on the map, so teams must make tactical decisions to be able to rotate across the map. The first team to 250 points wins the map.

embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
  use_start_time(6*60 + 35)

Search and Destroy

In Search and Destroy, the two teams play rounds where each player only has one life; if you die, you are dead until the next round. The objective is to either kill the entire other team before the time limit, or if you are on offense, then you can plant the bomb. If the bomb detonates after 45 seconds, then you also win the round. The first team to win 6 rounds wins the map.

embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
  use_start_time(18*60 + 39)

Control

In Control, there is an offense team and a defense team. There are multiple rounds where each team switches off between offense and defense. Each team has 30 lives per round. The first time to win three rounds wins the map. The offensive team is trying to either capture two points on the map, or eliminate all 30 lives of the other team. The defensive team is trying to either defend the two points before the time rounds out, or eliminate all 30 lives of the other team.

embed_url("https://www.youtube.com/watch?v=VQC0aZuGBFs&t=2740s") %>%
  use_start_time(45*60 + 40)

Load Packages


Load Data

This project makes use of official CWL data that is uploaded on Github. All data is organized relatively cleanly and all missing data is reported.

proleague2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-07-05-proleague.csv"))
fortworth2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-03-17-fortworth.csv"))
london2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-05-05-london.csv"))
anaheim2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-06-16-anaheim.csv"))
proleagueFinals2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-07-21-proleague-finals.csv"))

# all stats for all major tournaments (EXCEPT CHAMPS) in BO4 (2019)
majors2019 <- rbind(proleague2019, fortworth2019, london2019, anaheim2019, proleagueFinals2019)

# champs will act as our test data; we will try and predict the winner
champs2019 <- read_csv(url("https://raw.githubusercontent.com/Activision/cwl-data/master/data/data-2019-08-18-champs.csv"))

What data are we concerned with?

In order to assign an overall score to each individual player, we will need to address Hardpoint, Search and Destroy, and the Control separately. Once we have an individual score for each of the three game modes, we can use these to determine a final score.

Hardpoint: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. hill_time_s – hill time measured in seconds
10. hill_captures – shows activity on the map (MIGHT INCLUDE)
11. hill_defends – shows activity on the map (MIGHT INCLUDE)
12. match_id – helpful for getting rid of missing data

Search and Destroy: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. fb_round_ratio – ‘snd_firstbloods’/‘snd_rounds’ (NOT INCLUDED IN BASE DATA SET)
10. bomb_sneak_defuses – sneak defuses are often in pivotal rounds
11. bomb_plants – good indicator of role (MIGHT INCLUDE)
12. bomb_defuses – good indicator of role (MIGHT INCLUDE)
13. match_id – helpful for getting rid of missing data

Control: 1. player – what player does the data correspond to
2. mode – game mode
3. win – ‘W’ or ‘L’; use to find overall player win/loss ratio
4. k_d – kill/death ratio; used to show overall impact on the map
5. assists – in addition to k/d, assists show overall support on the map; higher assists can indicate better team work
6. accuracy_percent – player accuracy for each match
7. damage_dealt – total damage done in the map
8. player_spm – score per minute
9. match_id – helpful for getting rid of missing data


Data Split


Data Cleaning and Organization

The data below is for all of the majors throughout the season, except for COD Champs. We will reserve COD Champs to act as a test set. The raw data from each major is merged into one major dataset, further broken up into Hardpoint, SND, and Control datasets.

All Majors 2019 data

# CLEANING
majors2019 <- majors2019 %>% clean_names(.)

# new dataset that contains all of the missing data, just in case
majors2019_missing <- sqldf('SELECT * FROM majors2019 WHERE match_id LIKE "missing%"')

# whole event data, all players and all maps, where player names are organized alphabetically
majors2019 <- majors2019[order(majors2019$player),]

# removes missing values
majors2019 <- sqldf('SELECT * FROM majors2019 WHERE match_id NOT LIKE "missing%"')

# calculates all the players that have played more than 50 games
player_numgames <- count(majors2019, player) %>% subset(., n > 50) %>% remove_cols(n)

# includes all existing data for all players that have played more than 50 games (arbitrary number)
majors2019 <- sqldf('SELECT * FROM majors2019 WHERE player IN player_numgames')

# removes all matches where damage = 0; almost always occurs as a result of data loss
majors2019 <- sqldf('SELECT * FROM majors2019 WHERE damage_dealt != "0"')

# changes W to 1, L to 0
majors2019$win <- ifelse(majors2019$win == "W", 1, 0) %>%
  as.factor()

# assigning a role to each player to allow for more precise comparisons
playerRoles <- majors2019 %>%
  group_by(player) %>%
  count(player, fave_weapon) %>%
  top_n(1, n) %>%
  mutate(role = fave_weapon) %>%
  subset(select = -c(fave_weapon, n))

# replace fav gun with corresponding role
playerRoles$role <- str_replace(playerRoles$role, "Saug 9mm", "1")
playerRoles$role <- str_replace(playerRoles$role, "Maddox RFB", "2")
playerRoles$role <- str_replace(playerRoles$role, "ICR-7", "3")

# making factors
playerRoles$role <- factor(playerRoles$role)

# manually adjustment for player TJHaly
playerRoles <- playerRoles[-c(83), ]

majors2019 <- dplyr::inner_join(playerRoles, majors2019, by = "player")

A player’s role is defined as a sub (1), flex (2), or an ar (3).

Hardpoint subset

# all 2019 hardpoint data
hp2019 <- sqldf('SELECT player, k_d, role, win, kills, deaths, x, assists, damage_dealt, player_spm, hill_time_s, hill_captures, hill_defends, x2_piece, x3_piece, x4_piece FROM majors2019 WHERE mode == "Hardpoint"')
hp2019 <- hp2019[order(hp2019$player),]

Search and Destroy subset

# all 2019 SND data
snd2019 <- sqldf('SELECT match_id, team, player, role, win, kills, deaths, k_d, assists, damage_dealt, player_spm, bomb_sneak_defuses, bomb_plants, bomb_defuses, snd_rounds, snd_firstbloods, snd_1_kill_round, snd_2_kill_round, snd_3_kill_round, snd_4_kill_round, x2_piece, x3_piece, x4_piece FROM majors2019 WHERE mode == "Search & Destroy"')

# adds new column with fb/round ratio
snd2019 <- add_column(snd2019, fb_round_ratio = snd2019$snd_firstbloods/snd2019$snd_rounds)

# adding a new column with average first bloods for the season
snd2019 <- snd2019 %>%
  group_by(player) %>%
  mutate(fb_avg = mean(snd_firstbloods))

# puts data in alphabetical order
snd2019 <- snd2019[order(snd2019$player),]

Control subset

# all 2019 CONTROL data
control2019 <- sqldf('SELECT player, role, win, k_d, assists, damage_dealt, player_spm, x2_piece, x3_piece, x4_piece, ctrl_firstbloods, ctrl_firstdeaths, ctrl_captures FROM majors2019 WHERE mode == "Control"')
control2019 <- control2019[order(control2019$player),]

Champs 2019 dataset

champs2019 <- champs2019 %>% clean_names(.)
champs2019 <- champs2019[order(champs2019$player),]
champs2019 <- sqldf('SELECT * FROM champs2019 WHERE match_id NOT LIKE "missing%"')
champs2019 <- sqldf('SELECT * FROM champs2019 WHERE damage_dealt != "0"')

# changes W to 1, L to 0
champs2019$win <- ifelse(champs2019$win == "W", 1, 0) %>%
  as.factor()

champs2019 <- dplyr::inner_join(playerRoles, champs2019, by = "player")

Hardpoint CHAMPS subset

# CHAMPS 2019 hardpoint data
hpChamps <- sqldf('SELECT player, k_d, role, win, kills, deaths, x, assists, damage_dealt, player_spm, hill_time_s, hill_captures, hill_defends FROM champs2019 WHERE mode == "Hardpoint"')
hpChamps <- hpChamps[order(hpChamps$player),]

Search and Destroy CHAMPS subset

# CHAMPS 2019 SND data
sndChamps <- sqldf('SELECT player, win, role, k_d, assists, damage_dealt, player_spm, bomb_sneak_defuses, bomb_plants, bomb_defuses, snd_rounds, snd_firstbloods FROM champs2019 WHERE mode == "Search & Destroy"')

# adds new column with fb/round ratio
sndChamps <- add_column(sndChamps, fb_round_ratio = sndChamps$snd_firstbloods/sndChamps$snd_rounds)

# adding a new column with average first bloods for the season
sndChamps <- sndChamps %>%
  group_by(player) %>%
  mutate(fb_avg = mean(snd_firstbloods))

# puts data in alphabetical order
sndChamps <- sndChamps[order(sndChamps$player),]

Control CHAMPS subset

# CHAMPS 2019 CONTROL data
controlChamps <- sqldf('SELECT player, role, win, k_d, assists, damage_dealt, player_spm FROM champs2019 WHERE mode == "Control"')
controlChamps <- controlChamps[order(controlChamps$player),]

Team Hardpoint Data

# getting all necessary data for hardpoint
mergedhp2019 <- sqldf('SELECT match_id, team, player, role, kills, deaths, win, assists, damage_dealt, player_spm, hill_captures, hill_defends FROM majors2019 WHERE mode == "Hardpoint"')

# organizing by each match
mergedhp2019 <- mergedhp2019[order(mergedhp2019$match_id),]

# removing all matches that DON'T include all 10 players
# calculates all the matches that have all 10 players
match_numplayers <- count(mergedhp2019, match_id) %>% subset(., n == 10) %>% remove_cols(n)

# includes matches where all 10 players have existing data
mergedhp2019 <- sqldf('SELECT * FROM mergedhp2019 WHERE match_id IN match_numplayers')


# merge rows so that all the players from each team are one row; expect 800 observations with about 50 variables
test_mergedhp2019 <- mergedhp2019 %>%
  rename(damage = damage_dealt,
         spm = player_spm,
         hillcaptures = hill_captures,
         hilldefends = hill_defends) %>%
   mutate(rn = rowid(match_id, team)) %>% 
   pivot_wider(names_from = rn, values_from = c(win, 
                                                player, 
                                                kills,
                                                deaths,
                                                assists, 
                                                damage, 
                                                spm, 
                                                hillcaptures, 
                                                hilldefends)) %>%
  subset(select = -c(win_2, win_3, win_4, win_5,
                     player_1, player_2, player_3, player_4, player_5)) %>%
  rename(win = win_1)

# team_mergedhp2019 <- test_mergedhp2019 %>%
#   group_by(match_id, team) %>%
#   mutate(kills = sum(kills_1, kills_2, kills_3, kills_4, kills_5),
#          deaths = sum(deaths_1, deaths_2, deaths_3, deaths_4, deaths_5),
#          kd = kills/deaths,
#          assists = sum(assists_1, assists_2, assists_3, assists_4, assists_5),
#          spm = mean(spm_1, spm_2, spm_3, spm_4, spm_5),
#          hillcaptures = sum(hillcaptures_1, hillcaptures_2, hillcaptures_3, hillcaptures_4, hillcaptures_5),
#          hilldefends = sum(hilldefends_1, hilldefends_2, hilldefends_3, hilldefends_4, hilldefends_5),
#          damage = sum(damage_1, damage_2, damage_3, damage_4, damage_5)) %>%
#   subset(select = c(win, kd, assists, spm, hillcaptures, hilldefends, damage))

Team SND Data

# getting all necessary data for hardpoint
team_snd2019 <- sqldf('SELECT match_id, k_d, role, team, player, win, assists, damage_dealt, player_spm, bomb_sneak_defuses, bomb_plants, bomb_defuses, snd_firstbloods, snd_rounds FROM snd2019')

# organizing by each match
team_snd2019 <- team_snd2019[order(team_snd2019$match_id),]

# removing all matches that DON'T include all 10 players
# calculates all the matches that have all 10 players
match_numplayers <- count(team_snd2019, match_id) %>% subset(., n == 10) %>% remove_cols(n)

# includes matches where all 10 players have existing data
team_snd2019 <- sqldf('SELECT * FROM team_snd2019 WHERE match_id IN match_numplayers')


# merge rows so that all the players from each team are one row; expect 800 observations with about 50 variables
team_snd2019 <- team_snd2019 %>%
  rename(kd = k_d,
         damage = damage_dealt,
         spm = player_spm,
         fb = snd_firstbloods, 
         rounds = snd_rounds,
         defuses = bomb_defuses,
         plants = bomb_plants,
         nd = bomb_sneak_defuses) %>%
   mutate(rn = rowid(match_id, team)) %>% 
   pivot_wider(names_from = rn, values_from = c(win, 
                                                player,
                                                kd,
                                                role,
                                                assists, 
                                                damage, 
                                                spm, 
                                                fb, 
                                                rounds,
                                                defuses,
                                                plants,
                                                nd)) %>%
  subset(select = -c(win_2, win_3, win_4, win_5,
                     player_1, player_2, player_3, player_4, player_5,
                     rounds_2, rounds_3, rounds_4, rounds_5,
                     match_id, team)) %>%
  rename(win = win_1) %>%
  rename(rounds = rounds_1)



# team_snd2019 <- team_snd2019 %>%
#   group_by(match_id, team) %>%
#   mutate(kills = sum(kills_1, kills_2, kills_3, kills_4, kills_5),
#          deaths = sum(deaths_1, deaths_2, deaths_3, deaths_4, deaths_5),
#          kd = kills/deaths,
#          assists = sum(assists_1, assists_2, assists_3, assists_4, assists_5),
#          spm = mean(spm_1, spm_2, spm_3, spm_4, spm_5),
#          damage = sum(damage_1, damage_2, damage_3, damage_4, damage_5),
#          fb = sum(fb_1, fb_2, fb_3, fb_4, fb_5),
#          fbratio = fb/rounds_1,
#          plants = sum(plants_1, plants_2, plants_3, plants_4, plants_5),
#          defuses = sum(defuses_1, defuses_2, defuses_3, defuses_4, defuses_5),
#          nd = sum(nd_1, nd_2, nd_3, nd_4, nd_5)) %>%
#   subset(select = c(win, role, kd, kills, deaths, assists, rounds_1, damage, fbratio, plants, defuses, nd))

Exploratory Data Analysis

For my exploratory data analysis, I will be using just the season data. It will not include the Champs data.

Kill/death for season

ggplot(majors2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "OVERALL Player K/D's, 2019 Season (BO4), Descending")

ggplot(hp2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for HARDPOINT, 2019 Season (BO4), Descending")

ggplot(snd2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for SEARCH AND DESTROY, 2019 Season (BO4), Descending")

ggplot(control2019, aes(x = reorder(player, k_d), y = k_d)) + geom_boxplot() + coord_flip(ylim = c(0, 3.5)) + labs(y = "Kill/death ratio", x = "Player", subtitle = "Player K/D's for CONTROL, 2019 Season (BO4), Descending")

Search and Destroy First Bloods

Search and Destroy is a gamemode that has multiple rounds, where in each round, every player only has one life. A “first blood” is the first kill of the round and is usually highly influential. This a common stat that commentators and the community look at.

Firstblood average

# player firstblood average for SND 2019

ggplot(snd2019, aes(x = reorder(player, fb_avg), y = fb_avg)) + geom_point() + coord_flip(ylim = c(0, 3)) + labs(y = "Firstblood Average", x = "Player", subtitle = "Player Firstblood Average for SEARCH AND DESTROY, 2019 Season (BO4), Descending")

Firstblood totals

# player firstbloods for SND 2019

ggplot(snd2019, aes(x = reorder(player, snd_firstbloods), y = snd_firstbloods)) + geom_boxplot() + coord_flip(ylim = c(0, 6)) + labs(y = "Firstbloods", x = "Player", subtitle = "Player Firstbloods for SEARCH AND DESTROY, 2019 Season (BO4), Descending")

Firstblood/round

# player firstblood/round for SND 2019

ggplot(snd2019, aes(x = reorder(player, fb_round_ratio), y = fb_round_ratio)) + geom_boxplot() + coord_flip(ylim = c(0, 0.6)) + labs(y = "Firstblood/round ratio", x = "Player", subtitle = "Player Firstblood/Round for SEARCH AND DESTROY, 2019 Season (BO4), Descending")

Overall Damage Dealt

# player damage dealt OVERALL 2019

# removes all entries where damage is 0; this is almost always a result of data loss
majors2019 <- sqldf('SELECT * FROM majors2019 WHERE damage_dealt != "0"')
playerDamage <- sqldf('SELECT player, damage_dealt FROM majors2019 WHERE damage_dealt != "0"')

ggplot(playerDamage, aes(x = reorder(player, damage_dealt), y = damage_dealt)) + geom_boxplot() + coord_flip(ylim = c(0, 10000)) + labs(y = "Damage Dealt", x = "Player", subtitle = "OVERALL Player Damage Dealt, 2019 Season (BO4), Descending")

Overall Score/Min (spm)

# Overall score per minute for 2019 season

ggplot(majors2019, aes(x = reorder(player, player_spm), y = player_spm)) + geom_boxplot() + coord_flip(ylim = c(0, 675)) + labs(y = "Score per minute", x = "Player", subtitle = "OVERALL Player Score per minute, 2019 Season (BO4), Descending")

Number of Wins

# Overall number of wins for 2019 season

playerwins <- sqldf('SELECT player, win FROM majors2019 WHERE win == "1"') # selects all the wins for each player
playerwins <- playerwins %>% count(player) # counts the number of wins per player

ggplot(playerwins, aes(x = reorder(player, n), y = n)) + geom_bar(stat = 'identity') + coord_flip() + labs(y = "Number of Wins", x = "Player", subtitle = "OVERALL Number of Wins per Player, 2019 Season (BO4), Descending")

The top 4 players with the most amount of wins in the season are Slasher, Octane, Kenny, and Enable. The interesting part about this is that all of these players were on the same team, 100 Thieves. They all tied with 116 wins during the season.

playerwins %>%
  ggplot(aes(x = n)) + geom_histogram(binwidth = 15, color = "black", fill = "white")

The number of wins appears to follow a normal distribution. The left side of the histogram appears to be slightly more populated, but I hypothesize that this is due to players that didn’t play for the whole season.


Models: Hardpoint

I will be trying to predict whether an individual player will win or lose a game based on his statistics in the given game.

Splitting Data:

hp2019_wl <- hp2019

set.seed(3068)

hp2019_wlsplit <- hp2019_wl %>%
  initial_split(prop = 0.8, strata = "win")

hp2019_train <- training(hp2019_wlsplit)
hp2019_test <- testing(hp2019_wlsplit)


head(hp2019_train)
##    player  k_d role win kills deaths   x assists damage_dealt player_spm
## 4   Abezy 0.66    1   0    19     29 -10       6         3891      290.9
## 5   Abezy 1.18    1   0    26     22   4       8         4480      393.3
## 12  Abezy 0.88    1   0    22     25  -3      14         4515      322.3
## 15  Abezy 0.76    1   0    19     25  -6       8         4868      295.7
## 16  Abezy 0.83    1   0    20     24  -4       4         3954      269.5
## 20  Abezy 1.33    1   0    28     21   7       7         4733      400.3
##    hill_time_s hill_captures hill_defends x2_piece x3_piece x4_piece
## 4           48             4            5        2        0        0
## 5           55             5           14        2        1        0
## 12          80             9            6        4        0        0
## 15          77             7           11        5        0        0
## 16          35             3            6        2        2        0
## 20          71             6           10        3        0        0
dim(hp2019_train)
## [1] 3551   16
dim(hp2019_test)
## [1] 889  16
prop.table(table(hp2019_train$win))
## 
##         0         1 
## 0.4987328 0.5012672

Model 1: Decision Tree

Creating a general decision tree specification using rpart:

hp_tree_spec <- decision_tree() %>%
  set_engine("rpart")

hp_class_tree_spec <- hp_tree_spec %>%
  set_mode("classification")

hp_class_tree_fit <- hp_class_tree_spec %>%
  fit(win ~ k_d + assists + damage_dealt + player_spm + hill_time_s + hill_captures + hill_defends + x2_piece + x3_piece + x4_piece, data = hp2019_train)

hp_class_tree_fit %>%
  extract_fit_engine() %>%
  rpart.plot()
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

Checking confusion matrix and accuracy of the train data:

augment(hp_class_tree_fit, new_data = hp2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1379  601
##          1  392 1179
hp_dt_accuracy <- augment(hp_class_tree_fit, new_data = hp2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
hp_dt_accuracy
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.720

Model 2: Random Forest

# library(randomForest)
# library(datasets)
# library(caret)

set.seed(306)

hp_rf <- randomForest(win ~ k_d + assists + damage_dealt + player_spm + hill_time_s + hill_captures + hill_defends + x2_piece + x3_piece + x4_piece, data = hp2019_train, ntree = 500)

print(hp_rf)
## 
## Call:
##  randomForest(formula = win ~ k_d + assists + damage_dealt + player_spm +      hill_time_s + hill_captures + hill_defends + x2_piece + x3_piece +      x4_piece, data = hp2019_train, ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 26.13%
## Confusion matrix:
##      0    1 class.error
## 0 1300  471   0.2659514
## 1  457 1323   0.2567416
hp_rfAccuracy <- (1300 + 1323) / (1300 + 1323 + 457 + 471)
hp_rfAccuracy
## [1] 0.7386652
importance(hp_rf)
##               MeanDecreaseGini
## k_d                  398.48898
## assists              143.83018
## damage_dealt         229.80142
## player_spm           289.30019
## hill_time_s          301.43364
## hill_captures        105.26674
## hill_defends         165.37544
## x2_piece              87.93843
## x3_piece              38.17524
## x4_piece              13.21070
varImpPlot(hp_rf)

hp_pred1 = predict(hp_rf, type = "prob")

library(ROCR)

hp_perf = prediction(hp_pred1[,2], hp2019_train$win)

# 1. Area under curve
hp_auc = performance(hp_perf, "auc")
hp_auc
## A performance instance
##   'Area under the ROC curve'
# 2. True Positive and Negative Rate
hp_pred3 = performance(hp_perf, "tpr","fpr")

# 3. Plot the ROC curve
plot(hp_pred3, main="ROC Curve for Random Forest", col=2, lwd=2)
abline(a=0, b=1, lwd=2, lty=2, col="gray")

plot(hp_rf)


Model 3: Logistic Regression

Recipe, Engine, and Workflow:

hp_recipe <- recipe(win ~ k_d + assists + damage_dealt + player_spm + hill_time_s + hill_captures + hill_defends + x2_piece + x3_piece + x4_piece, data = hp2019_train)

hp_log_reg <- logistic_reg() %>% 
  set_engine("glm") %>% 
  set_mode("classification")

hp_log_wkflow <- workflow() %>% 
  add_model(hp_log_reg) %>% 
  add_recipe(hp_recipe)

hp_log_fit <- fit(hp_log_wkflow, hp2019_train)

hp_log_fit %>% 
  tidy()
## # A tibble: 11 × 5
##    term           estimate std.error statistic  p.value
##    <chr>             <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)   -5.00     0.264        -19.0  3.09e-80
##  2 k_d            4.06     0.261         15.5  1.81e-54
##  3 assists        0.183    0.0162        11.3  2.09e-29
##  4 damage_dealt  -0.000254 0.0000594     -4.29 1.82e- 5
##  5 player_spm     0.00175  0.00114        1.53 1.26e- 1
##  6 hill_time_s    0.0189   0.00179       10.5  6.67e-26
##  7 hill_captures  0.0547   0.0222         2.46 1.40e- 2
##  8 hill_defends  -0.206    0.0171       -12.0  2.32e-33
##  9 x2_piece      -0.0422   0.0313        -1.35 1.78e- 1
## 10 x3_piece      -0.213    0.0716        -2.97 2.95e- 3
## 11 x4_piece      -0.330    0.187         -1.76 7.77e- 2

Assessing Model Performance:

predict(hp_log_fit, new_data = hp2019_train, type = "prob")
## # A tibble: 3,551 × 2
##    .pred_0 .pred_1
##      <dbl>   <dbl>
##  1   0.845  0.155 
##  2   0.745  0.255 
##  3   0.242  0.758 
##  4   0.860  0.140 
##  5   0.914  0.0865
##  6   0.342  0.658 
##  7   0.816  0.184 
##  8   0.668  0.332 
##  9   0.707  0.293 
## 10   0.638  0.362 
## # … with 3,541 more rows
augment(hp_log_fit, new_data = hp2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1380  499
##          1  391 1281
augment(hp_log_fit, new_data = hp2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class) %>%
  autoplot(type = "heatmap")

Checking accuracy:

hp_log_reg_acc <- augment(hp_log_fit, new_data = hp2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
hp_log_reg_acc
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.749

Model 4: Support Vector Machine

hp_svm_rbf_spec <- svm_rbf() %>%
  set_mode("classification") %>%
  set_engine("kernlab")

hp_svm_rbf_fit <- hp_svm_rbf_spec %>%
  fit(win ~ k_d + assists + damage_dealt + player_spm + 
        hill_time_s + hill_captures + hill_defends + 
        x2_piece + x3_piece + x4_piece, 
      data = hp2019_train)

augment(hp_svm_rbf_fit, new_data = hp2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1394  439
##          1  377 1341
hp_svmAccuracy <- augment(hp_svm_rbf_fit, new_data = hp2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
hp_svmAccuracy
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.770

Comparing Model Performance

hp_accuracies <- c(hp_dt_accuracy$.estimate,
                    hp_rfAccuracy,
                    hp_log_reg_acc$.estimate, 
                    hp_svmAccuracy$.estimate)
hp_accuracies
## [1] 0.7203605 0.7386652 0.7493664 0.7702056

As we can see from this, the SVM appears to have the highest accuracy.


Models: Search and Destroy

Splitting Data:

set.seed(1)

snd2019_split <- snd2019 %>%
  initial_split(prop = 0.8, strata = "win")

snd2019_train <- training(snd2019_split)
snd2019_test <- testing(snd2019_split)


head(snd2019_train)
## # A tibble: 6 × 25
## # Groups:   player [1]
##   match_id      team  player role  win   kills deaths   k_d assists damage_dealt
##   <chr>         <chr> <chr>  <fct> <fct> <dbl>  <dbl> <dbl>   <dbl>        <dbl>
## 1 737440468739… eUni… Abezy  1     0        11      7  1.57       0         1355
## 2 144010197940… eUni… Abezy  1     0         7      7  1          1         1938
## 3 259560665349… eUni… Abezy  1     0         2      7  0.29       4          796
## 4 146217688303… eUni… Abezy  1     0         7      7  1          0         1275
## 5 177954718572… eUni… Abezy  1     0         9      8  1.12       1         1561
## 6 167778343948… eUni… Abezy  1     0        10      9  1.11       3         1434
## # … with 15 more variables: player_spm <dbl>, bomb_sneak_defuses <dbl>,
## #   bomb_plants <dbl>, bomb_defuses <dbl>, snd_rounds <dbl>,
## #   snd_firstbloods <dbl>, snd_1_kill_round <dbl>, snd_2_kill_round <dbl>,
## #   snd_3_kill_round <dbl>, snd_4_kill_round <dbl>, x2_piece <dbl>,
## #   x3_piece <dbl>, x4_piece <dbl>, fb_round_ratio <dbl>, fb_avg <dbl>
dim(snd2019_train)
## [1] 2791   25
dim(snd2019_test)
## [1] 699  25
prop.table(table(snd2019_train$win))
## 
##        0        1 
## 0.498746 0.501254

Model 1: Decision Tree

Creating a general decision tree specification using rpart:

tree_spec <- decision_tree() %>%
  set_engine("rpart")

class_tree_spec <- tree_spec %>%
  set_mode("classification")

class_tree_fit <- class_tree_spec %>%
  fit(win ~ k_d + role + assists + damage_dealt + 
        player_spm + bomb_sneak_defuses + bomb_plants +
        bomb_defuses + fb_round_ratio + snd_firstbloods + 
        snd_1_kill_round + snd_2_kill_round + snd_3_kill_round + 
        snd_4_kill_round + x2_piece + x3_piece + x4_piece, 
      data = snd2019_train)

class_tree_fit %>%
  extract_fit_engine() %>%
  rpart.plot()
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

Checking confusion matrix and accuracy of the train data:

augment(class_tree_fit, new_data = snd2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1099  544
##          1  293  855
class_tree <- augment(class_tree_fit, new_data = snd2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
class_tree
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.700

Model 2: Random Forest

# library(randomForest)
# library(datasets)
# library(caret)

set.seed(306)

rf <- randomForest(win ~ k_d + role + assists + damage_dealt + 
        player_spm + bomb_sneak_defuses + bomb_plants +
        bomb_defuses + fb_round_ratio + snd_firstbloods + 
        snd_1_kill_round + snd_2_kill_round + snd_3_kill_round + 
        snd_4_kill_round + x2_piece + x3_piece + x4_piece, 
      data = snd2019_train,
      ntree = 500)

print(rf)
## 
## Call:
##  randomForest(formula = win ~ k_d + role + assists + damage_dealt +      player_spm + bomb_sneak_defuses + bomb_plants + bomb_defuses +      fb_round_ratio + snd_firstbloods + snd_1_kill_round + snd_2_kill_round +      snd_3_kill_round + snd_4_kill_round + x2_piece + x3_piece +      x4_piece, data = snd2019_train, ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 29.7%
## Confusion matrix:
##      0   1 class.error
## 0 1032 360   0.2586207
## 1  469 930   0.3352395
rfAccuracy <- (1032 + 930) / (1032 + 930 + 360 + 469)
rfAccuracy
## [1] 0.7029738
importance(rf)
##                    MeanDecreaseGini
## k_d                     310.6887002
## role                     49.0524625
## assists                 104.8659545
## damage_dealt            216.9071156
## player_spm              237.4322880
## bomb_sneak_defuses        5.8702582
## bomb_plants              52.2779181
## bomb_defuses             24.2341082
## fb_round_ratio           80.0289794
## snd_firstbloods          37.7644198
## snd_1_kill_round         82.4309082
## snd_2_kill_round         55.7139776
## snd_3_kill_round         26.4644834
## snd_4_kill_round          9.5438101
## x2_piece                 41.6299549
## x3_piece                  8.1996818
## x4_piece                  0.8748542
varImpPlot(rf)

pred1=predict(rf,type = "prob")

library(ROCR)

perf = prediction(pred1[,2], snd2019_train$win)

# 1. Area under curve
auc = performance(perf, "auc")
auc
## A performance instance
##   'Area under the ROC curve'
# 2. True Positive and Negative Rate
pred3 = performance(perf, "tpr","fpr")

# 3. Plot the ROC curve
plot(pred3,main="ROC Curve for Random Forest",col=2,lwd=2)
abline(a=0,b=1,lwd=2,lty=2,col="gray")


Model 3: Logistic Regression

Recipe, Engine, and Workflow:

snd_recipe <- recipe(win ~ k_d + role + assists + damage_dealt + 
        player_spm + bomb_sneak_defuses + bomb_plants +
        bomb_defuses + fb_round_ratio + snd_firstbloods + 
        snd_1_kill_round + snd_2_kill_round + snd_3_kill_round + 
        snd_4_kill_round + x2_piece + x3_piece + x4_piece, 
      data = snd2019_train)

log_reg <- logistic_reg() %>% 
  set_engine("glm") %>% 
  set_mode("classification")

log_wkflow <- workflow() %>% 
  add_model(log_reg) %>% 
  add_recipe(snd_recipe)

log_fit <- fit(log_wkflow, snd2019_train)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
log_fit %>% 
  tidy()
## # A tibble: 19 × 5
##    term                 estimate std.error statistic  p.value
##    <chr>                   <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)         -1.49      0.181       -8.23  1.93e-16
##  2 k_d                  5.22      0.292       17.8   2.93e-71
##  3 role2               -0.141     0.105       -1.34  1.81e- 1
##  4 role3               -0.119     0.132       -0.901 3.67e- 1
##  5 assists              0.375     0.0367      10.2   1.49e-24
##  6 damage_dealt        -0.000364  0.000168    -2.17  3.03e- 2
##  7 player_spm          -0.0168    0.00372     -4.50  6.78e- 6
##  8 bomb_sneak_defuses   0.0981    0.307        0.320 7.49e- 1
##  9 bomb_plants          0.229     0.0547       4.19  2.79e- 5
## 10 bomb_defuses         0.171     0.127        1.35  1.76e- 1
## 11 fb_round_ratio     -11.9       2.80        -4.26  2.04e- 5
## 12 snd_firstbloods      1.44      0.304        4.73  2.30e- 6
## 13 snd_1_kill_round    -0.422     0.0564      -7.48  7.46e-14
## 14 snd_2_kill_round    -1.02      0.115       -8.88  6.69e-19
## 15 snd_3_kill_round    -1.61      0.186       -8.66  4.78e-18
## 16 snd_4_kill_round    -2.65      0.324       -8.18  2.74e-16
## 17 x2_piece             0.206     0.0808       2.54  1.09e- 2
## 18 x3_piece             0.213     0.257        0.829 4.07e- 1
## 19 x4_piece             0.111     0.876        0.126 8.99e- 1

Assessing Model Performance:

predict(log_fit, new_data = snd2019_train, type = "prob")
## # A tibble: 2,791 × 2
##    .pred_0 .pred_1
##      <dbl>   <dbl>
##  1   0.556   0.444
##  2   0.716   0.284
##  3   0.696   0.304
##  4   0.635   0.365
##  5   0.758   0.242
##  6   0.816   0.184
##  7   0.461   0.539
##  8   0.732   0.268
##  9   0.409   0.591
## 10   0.438   0.562
## # … with 2,781 more rows
augment(log_fit, new_data = snd2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1138  477
##          1  254  922
augment(log_fit, new_data = snd2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class) %>%
  autoplot(type = "heatmap")

Checking accuracy:

log_reg_acc <- augment(log_fit, new_data = snd2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
log_reg_acc
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.738

Model 4: Support Vector Machine

svm_rbf_spec <- svm_rbf() %>%
  set_mode("classification") %>%
  set_engine("kernlab")

svm_rbf_fit <- svm_rbf_spec %>%
  fit(win ~ k_d + role + assists + damage_dealt + 
        player_spm + bomb_sneak_defuses + bomb_plants +
        bomb_defuses + fb_round_ratio + snd_firstbloods + 
        snd_1_kill_round + snd_2_kill_round + snd_3_kill_round + 
        snd_4_kill_round + x2_piece + x3_piece + x4_piece, 
      data = snd2019_train)

augment(svm_rbf_fit, new_data = snd2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction    0    1
##          0 1173  436
##          1  219  963
svmAccuracy <- augment(svm_rbf_fit, new_data = snd2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
svmAccuracy
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.765

Comparing Model Performance

snd_accuracies <- c(class_tree$.estimate,
                    rfAccuracy,
                    log_reg_acc$.estimate, 
                    svmAccuracy$.estimate)
snd_accuracies
## [1] 0.7001075 0.7029738 0.7380867 0.7653171

As we can see from this, the SVM model appears to have the highest accuracy.


Models: Control

Splitting Data:

set.seed(1)

control2019_split <- control2019 %>%
  initial_split(prop = 0.8, strata = "win")

control2019_train <- training(control2019_split)
control2019_test <- testing(control2019_split)


head(control2019_train)
##    player role win  k_d assists damage_dealt player_spm x2_piece x3_piece
## 4   Abezy    1   0 1.30      17         6459      386.2        5        1
## 16  Abezy    1   0 0.88       4         5353      215.9        2        0
## 18  Abezy    1   0 1.00       3         2657      285.7        3        0
## 21  Abezy    1   0 0.70       4         3077      225.0        1        0
## 22  Abezy    1   0 0.75       5         3357      217.5        2        1
## 23  Abezy    1   0 0.78       5         4315      215.3        1        1
##    x4_piece ctrl_firstbloods ctrl_firstdeaths ctrl_captures
## 4         0                1                0             4
## 16        0                1                0             2
## 18        0                0                0             1
## 21        0                0                0             1
## 22        0                1                1             1
## 23        0                1                0             2
dim(control2019_train)
## [1] 2122   13
dim(control2019_test)
## [1] 532  13
prop.table(table(control2019_train$win))
## 
##         0         1 
## 0.4995287 0.5004713

Model 1: Decision Tree

Creating a general decision tree specification using rpart:

control_tree_spec <- decision_tree() %>%
  set_engine("rpart")

control_class_tree_spec <- control_tree_spec %>%
  set_mode("classification")

control_class_tree_fit <- control_class_tree_spec %>%
  fit(win ~ k_d + role + assists + damage_dealt + 
        player_spm + x2_piece + x3_piece + x4_piece +
        ctrl_firstbloods + ctrl_firstdeaths + ctrl_captures, 
      data = control2019_train)

control_class_tree_fit %>%
  extract_fit_engine() %>%
  rpart.plot()
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

Checking confusion matrix and accuracy of the train data:

augment(control_class_tree_fit, new_data = control2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction   0   1
##          0 742 250
##          1 318 812
control_dt_accuracy <- augment(control_class_tree_fit, new_data = control2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
control_dt_accuracy
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.732

Model 2: Random Forest

# library(randomForest)
# library(datasets)
# library(caret)

control_rf <- randomForest(win ~ k_d + role + assists + damage_dealt + 
        player_spm + x2_piece + x3_piece + x4_piece +
        ctrl_firstbloods + ctrl_firstdeaths + ctrl_captures, 
      data = control2019_train,
      ntree = 500)

print(control_rf)
## 
## Call:
##  randomForest(formula = win ~ k_d + role + assists + damage_dealt +      player_spm + x2_piece + x3_piece + x4_piece + ctrl_firstbloods +      ctrl_firstdeaths + ctrl_captures, data = control2019_train,      ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 24.65%
## Confusion matrix:
##     0   1 class.error
## 0 803 257   0.2424528
## 1 266 796   0.2504708
control_rfAccuracy <- (795 + 786) / (795 + 786 + 276 + 265)
control_rfAccuracy
## [1] 0.7450518
importance(control_rf)
##                  MeanDecreaseGini
## k_d                    250.908459
## role                    35.679172
## assists                102.439185
## damage_dealt           172.520556
## player_spm             222.893690
## x2_piece                62.155880
## x3_piece                22.170488
## x4_piece                 6.380112
## ctrl_firstbloods        28.113423
## ctrl_firstdeaths        31.784533
## ctrl_captures          112.493422
varImpPlot(control_rf)

control_pred1 = predict(control_rf, type = "prob")

library(ROCR)

control_perf = prediction(control_pred1[,2], control2019_train$win)

# 1. Area under curve
control_auc = performance(control_perf, "auc")
control_auc
## A performance instance
##   'Area under the ROC curve'
# 2. True Positive and Negative Rate
control_pred3 = performance(control_perf, "tpr","fpr")

# 3. Plot the ROC curve
plot(control_pred3, main="ROC Curve for Random Forest", col=2, lwd=2)
abline(a=0, b=1, lwd=2, lty=2, col="gray")


Model 3: Logistic Regression

Recipe, Engine, and Workflow:

control_recipe <- recipe(win ~ k_d + role + assists + damage_dealt + 
        player_spm + x2_piece + x3_piece + x4_piece +
        ctrl_firstbloods + ctrl_firstdeaths + ctrl_captures, 
      data = control2019_train)

control_log_reg <- logistic_reg() %>% 
  set_engine("glm") %>% 
  set_mode("classification")

control_log_wkflow <- workflow() %>% 
  add_model(control_log_reg) %>% 
  add_recipe(control_recipe)

control_log_fit <- fit(control_log_wkflow, control2019_train)

control_log_fit %>% 
  tidy()
## # A tibble: 13 × 5
##    term              estimate std.error statistic  p.value
##    <chr>                <dbl>     <dbl>     <dbl>    <dbl>
##  1 (Intercept)      -4.67     0.327      -14.3    3.12e-46
##  2 k_d               4.14     0.316       13.1    2.74e-39
##  3 role2            -0.0116   0.128       -0.0912 9.27e- 1
##  4 role3            -0.0935   0.154       -0.606  5.44e- 1
##  5 assists           0.192    0.0222       8.69   3.75e-18
##  6 damage_dealt     -0.000466 0.0000680   -6.86   7.10e-12
##  7 player_spm        0.000260 0.00176      0.148  8.83e- 1
##  8 x2_piece         -0.124    0.0454      -2.74   6.14e- 3
##  9 x3_piece         -0.0678   0.117       -0.579  5.62e- 1
## 10 x4_piece         -0.142    0.337       -0.422  6.73e- 1
## 11 ctrl_firstbloods  0.0426   0.0906       0.471  6.38e- 1
## 12 ctrl_firstdeaths  0.231    0.0878       2.63   8.58e- 3
## 13 ctrl_captures     0.583    0.0546      10.7    1.42e-26

Assessing Model Performance:

predict(control_log_fit, new_data = control2019_train, type = "prob")
## # A tibble: 2,122 × 2
##    .pred_0 .pred_1
##      <dbl>   <dbl>
##  1  0.0598   0.940
##  2  0.850    0.150
##  3  0.713    0.287
##  4  0.872    0.128
##  5  0.828    0.172
##  6  0.805    0.195
##  7  0.615    0.385
##  8  0.510    0.490
##  9  0.528    0.472
## 10  0.333    0.667
## # … with 2,112 more rows
augment(control_log_fit, new_data = control2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction   0   1
##          0 842 272
##          1 218 790
augment(control_log_fit, new_data = control2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class) %>%
  autoplot(type = "heatmap")

Checking accuracy:

control_log_reg_acc <- augment(control_log_fit, new_data = control2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
control_log_reg_acc
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.769

Model 4: Support Vector Machine

control_svm_rbf_spec <- svm_rbf() %>%
  set_mode("classification") %>%
  set_engine("kernlab")

control_svm_rbf_fit <- control_svm_rbf_spec %>%
  fit(win ~ k_d + role + assists + damage_dealt + 
        player_spm + x2_piece + x3_piece + x4_piece +
        ctrl_firstbloods + ctrl_firstdeaths + ctrl_captures, 
      data = control2019_train)

augment(control_svm_rbf_fit, new_data = control2019_train) %>%
  conf_mat(truth = win, estimate = .pred_class)
##           Truth
## Prediction   0   1
##          0 883 227
##          1 177 835
control_svmAccuracy <- augment(control_svm_rbf_fit, new_data = control2019_train) %>%
  accuracy(truth = win, estimate = .pred_class)
control_svmAccuracy
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.810

Comparing Model Performance

control_accuracies <- c(control_dt_accuracy$.estimate,
                    control_rfAccuracy,
                    control_log_reg_acc$.estimate, 
                    control_svmAccuracy$.estimate)
control_accuracies
## [1] 0.7323280 0.7450518 0.7690858 0.8096136

Once again, we can see that the SVM model had the highest accuracy with 0.8096136.